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Prediction  and  Retrodiction 
by  Satosi  Watanabe 

An  attempt  is  made  within  the  framework  of  the  accepted  quantum 
physics  to  achieve  the  maximum  parallelism  between  prediction 
(inference  of  the  future  observational  data  from  the  present 
ones)  and  retrodiction  (inference  of  the  past  observational  data 
from  the  present  ones).  To  implement  this  program^  it  is  shown 
that  the  "retrodictive  state  function"  (extrapolation  of  the  pre- 
sent data  to  the  past)  can  be  just  as  useful  as  the  ordinary 
"predictive  state  function"  (extrapolation  of  the  present  data  to 
the  future) o  This  leads  to  a  formalism  in  which  time-reversal 
becomes  a  linear  transformation  and  double  time-reversal  becomes 
a  CHriumbero  In  spite  of  all  this  formal  symmetry,,  it  can  be 
shown  that  the  actual  success  of  a  retrodiction  depends  on  the 
satisfaction  of  an  additional  condition  which  is  not  required  in 
prediction^  and  which  is  not  always  fulfilled »  From  the  same 
point  of  view5  a  logical  loophole  is  pointed  out  in  the  indis- 
criminate application  of  the  H- theorem  to  the  past„  The  so-called 
irreversibility  of  observation  is  interpreted  in  terms  of  the  de- 
crease of  "information"  in  the  process  of  inference 0 


#1.  Introduction 

In  accordance  with  its  expected  role  in  human  activities,  physical 
theory  is  pre-eminently  a  predictive  instrument.  Man  is,  however,  not 
immune  to  temptation  of  the  advanture  of  guessing  with  the  same  instru- 
ment what  happened  in  the  past  outside  the  reach  of  his  own  observation. 
In  the  non-statistical  domain  of  classical  physics,  retrodiction  must  be 
in  principle  just  as  successful  as  prediction.  However,  in  statistical 
applications  of  classical  physics  and  in  quantum  physics,  a  careful  study 
is  needed  to  determine  the  confirmability  of  an  attempted  retrodiction. 
The  present  paper  is  wished  to  provide  an  answer  to  some  of  the  rudimentary 
questions  in  this  rather  neglected  field  of  intellectual  interest.  Al- 
though some  new  points  of  view  and  a  new  formalism  are  introduced,  the 
content  of  this  paper  will  remain  perfectly  faithful  to  the  accepted  pre- 
mises of  classical  and  quantum  physics.  It  should  be  noted  that  retrodic- 
tion is  a  question  defined  differently  from  the  so-called  time-reversal, 
although  it  is  related  to  this  in  a  certain  way  which  will  become  clear 
in  our  Sections  3  and  5» 

There  have  been  at  least  three  circumstantial  incentives  which  mo- 
tivated undertaking  this  work.  In  the  first  place,  it  was  emphasized  by 
the  author  in  a  previous  paper  that  an  essential  difference  between  clas- 
sical physics  and  quantum  physics  lies  in  the  fact  that  in  the  latter  the 
result  of  an  observation  can  be  used  as  the  initial  condition  of  the 
"state"  immediately  after  the  observation  but  not  as  the  final  condition 
of  the  "state"  immediately  before  the  observation.  Although  this  is  in 


I.  An  article  contributed  by  the  author  to  the  monograph,  Louis  de  Broglie, 
physicien  et  penseur  (Albin  Michel,  Paris,  1952)  p.  385. 
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agreement  with  the  customary  usage  of  quantum  physics,  the  conscious  em- 
phasis on  this  fact  led  the  author  himself  to  inquire  whether  one  could 

not  formulate  quantum  physics  in  such  a  way  that  the  result  of  an  observa^ 

2 

tion  can  be  used  as  the   "retrodictive  state"  just  before  the  observation. 

This  question  will  be  answered  in  Section  5  of  this  paper.     Although  the  an- 
swer is  in  the  affirmative,  the  actual  usefulness  of  such  a  retrodictive 
theory  is  extremely  limited. 

The  second  motive  stemmed  from  an  enlightening  illustration  that 
Dr.  Keith  Symon  chose  in  a  conversation  to  explain  the  reason  why  the  H- 
theorem  cannot  be  used  for  the  past.     He  imagines  that  a  man  discovers  on 
a  desk  two  piles  of  playing  cards,  one  in  a  perfect  order  and  another  in 
disorder,,     In  spite  of  the  fact  that  every  permutation  of  cards  has  the 
same  a  priori  probability,  he  would  not  guess  that  the  well-ordered  pile 
is   a  result  of  shuffling,  but  he  would  justifiably  infer  a  selective  human 
intervention  in  the  past  of  the  well-ordered  pile.     Keeping  in  mind  that   a 
"permutation  of  cards"   corresponds  to  a  quantum  state,   "well-ordered-ness" 
and  "disordered-ness"  to  macroscopic  cells,  and  "shuffling"  to  ergodic 
process,   the  reader  will  find  that  this  pattern  of  inference  is  given  a 

mathematical  expression  in  our  formulation  of  retrod icti on  in  Section  4« 

3 
Thirdly,  everyone  familiar  with  the  quantum  theory  of  time-reversal 

is  rather  disturbed  by  the  fact  that  the  operation  of  time-reversal  is 

not  a  linear  transformation  and  also  by  the  fact  that  the  operation  of 


it 

2.  It  is  a  pleasure  of  the  author  to  note  with  thanks  that  Dr.  Adolf  Grun- 

baum  in  a  private  communication  encouraged  undertaking  clarification  of 
this  question. 

3.  S.  Watanabe,  Phys.Rev.  8k,  1008(1951).     See  alsos  S.  Watanabe,  Rev. 
Mod.  Phys.   27,  in  press. 


double  time-reversal  does  not  become  an  identity  transformation,,     One  could 
expect  that  these  esthetically  unwelcome  features  of  the  theory  may  be 
avoided  by  a  formulation  which  treats  prediction  and  retrodiction  on  an 
equal  footing.     It  will  be  shown  in  Section  5  that  this  expectation  is 
justified. 

The  problem  of  retrodiction  may  be  formulated  in  brief  as  follows; 
An  observer  B  would  like  to  guess  from  his  own  experimental  data  the  result 
of  another  observer  A  who  observed  the  system  some  time  before  B  and  who 
has  not  confided  his  result  to  B.     The  main  difficulty  for  retrodictor  B 
arises  from  the  fact  that  in  his  retrodictive  inference  he  has  to  assume, 
apart  from  his  own  experimental  finding,    an  a  priori  probability  to  each 
possible  initial  state  (in  which  A  might  have  found  the  system) .     There  is 
in  general  no  reason  to   assume  an  equal  a  priori  probability  for  each 
quantum  state  except  when  the  initial  ensemble  given  to  A  can  justifiably 
be  assumed  to  be  the  result  of  an  ergodic  process.     It  will  be  explained 
in  Section  3  by  a  simple  example  how  easily  a  retrodictor  can  completely 
fail  while  a  predictor  cannot  fail,   naturally  in  the   statistical  sense  of 
the  word. 

However,  by  assuming  the  uniform  a  priori  initial  probability,  one 
can  obtain  an  interesting  formalism  which  exhibits  on  one  hand   a  complete 
symmetry  with  respect  to  the  two  directions  of  "time",  but  which  on  the 
other  manifests  a  definite  one=way~ness  of  the  direction  of  human  "infer- 
ence".    In  short,  the  present  paper  may  be  said  to  be  an  elaboration  in 
the  light  of  quantum  physics  of  the  following  pregnant  words  due  to  W.Gibbss 


4.     J.W.Gibbs,  Elementary  Principles  in  Statistical  Mechanics  (Yale 
University  Press,   New  Haven,   1914)  p0  150. 
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"It  should  not  be  forgotten,  when  our  ensembles  are  chosen  to  illus- 
trate the  probabilities  of  events  in  the  real  wo  rid  ,  that  while  the  proba- 
bilities of  subsequent  events  may  often  be  determined  from  the  probabili- 
ties of  prior  events,   it  is  rarely  the  case  that  probabilities  of  prior 
events  can  be  determined  from  those  of  subsequent  events,  for  we  are   rarely 
justified  in  excluding  the  consideration  of  the  antecedent  probability  of 
the  prior  event s," 


=5= 


#2.  Microscopic  Retrodiction 

Let  §  be  a  complete  set  of  eigen-statess 

s.AA,  -,stj...  (2.1) 

of  a  family  of  mutually  commuting  observables  defined  with  respect  to  a 
certain  physical  system.  We  shall  use  the  same  symbol  S  also  to  designate 
this  family  of  observables,,  The  completeness  of  ^  implies  that  the  proba- 
bility br  of  the  system  being  found  in  state  SL  satisfies 

?  ?■-  '  1  •  (2.2) 

A  family T  of  mutually  commuting  observables  which  do  not  commute  with  -§ 
will  define  another  complete  set  T  of  eigenfunctions. 

The  probability  that  the  system  which  was  in  state  S-  at  the  initial 
instant  will  be  found  in  state  Tj  after  t  seconds  will  be  denoted  by 

PCS-^T^T)  =  P(t-*p  ,  (2.3) 

^     '         J        ;  (2.4) 

where  S  and  !1C  may  or  may  not  be  the  same  complete  set.  On  account  of  the 
assumed  completeness ,  we  have 

?P('^)  =  1-  (W) 

From  the  invariance  of  dynamical  laws  for  time-reversal  (reversibility) 

or  from  the  invariance  for  space-and -time-inversion  (insersibility)  we  can 

3 
conclude  the  inverse  normalizations 

Z  PCL^J)=  1  ,  (2.6) 

This  can  also  be  derived  from  the  unitarity  of  transition  matrix. 

Suppose  that  an  observer  A  observes  the  system  at    t-0    with  the  ob- 
servable-family   ■§  s  and  that  observer  B  observes  the  same  system  at    t  - 1 


with  the  observable-family  T «  "Prediction"  consists  in  the  following  posi- 
tion of  problem  on  the  part  of  A.  Knowing  that  observer  B  will  observe  with 
T  at  t  =t  ,  observer  A  proposes  to  guess  the  result  of  B  on  the  basis  of 
his  own  result.  If  observer  A  had  the  result  S-L   ,  then  his  prediction  will 
be  that  the  probability  of  B  obtaining  T-  will  be  P(l-*J:).  This  means  that, 
if  observer  A  prepares  a  large  number  JJ  of  the  cases  where  the  result  at 
t~0    was  SL  ,  then  NP(  l-»j,  )  will  be  the  number  of  cases  where  observer  B 
will  obtain  T;  at  t  =  X  «  Observer  A  will  be  called  predictor  and  observer 
B  monitor  <, 

"Retrodiction"  is  now  to  be  defined  in  a  close  analogy  to  the  pre- 
vious problem,  only  interchanging  the  roles  of  A  and  B„  Knowing  that  ob- 
server A  observed  the  system  with  ^  at  t~0    ,  but  not  knowing  what  his  re- 
sult was j,  observer  B  proposes  to  infer  the  result  of  A  from  his  own  result 
that  the  system  is  found  in  Ts  at  t=x  ,  B  will  be  called  retrodictor  and 
A  monitor. 

This  question  does  not  have  a  unique  answer  unless  retrodictor  B  as- 
sumes a  certain  statistical  behavior  of  monitor  A  regarding  selection  of 
the  initial  states.  Independently  of  his  own  result  at  t  =  rt  $   retrodictor 
B  may  have  some  general  information  about  A,  on  the  basis  of  which  he  may 
assume  that  monitor  A  has  the  general  habit  of  selecting  (and  handing  over 
to  B)  states  S-L   with  weight  uj\,  ( £  uL  - 1  ) .  If  A  prepared  a  large  number  N" 
of  cases  at  t=0  ,  then  NvJ"-0  among  them  must  have  been  in  jS-  ,  according  to 
the  assumption.  At  the  receiving  end,FuS-uP(  l->^-)  among  these  Ni*/L  will 
turn  out  to  be  in  T-  c  The  total  number  of  cases  which  will  land  in  T- 

o  a 

will  then  be 

t  *  •     (2.7) 
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among  which HVCP(  i-f  r  )  have  originated  from  SL  .  Then  retrodictor  B  will 
say  that  the  probability  Q  (  ■'.<-  I  )  that  a  system  which  was  found  at  t=  x 
to  be  in  T-  had  been  found  in  £  .  at  t  -  0  is 

We  use  the  "left-to-right"  order  to  indicate  the  chronological  direction, 
and  an  arrow  to  indicate  the  direction  of  inference0 

It  should  be  clearly  understood  that  the  above  result  does  not  mean 
at  all  that  if  A  prepared  an  ensemble  with  the  weight  given  by  (2„8)  for 
each  S7  3   then  retrodictor  B  would  obtain  the  result  T-  „  Indeed ,  if  A 
started  with  thf  weight  distribution  (2C8)5  then  B  would  obtain  Tk  with 
weight  s 

£wLpq-*ppCL-*k) 

In  other  words,  (2„8)  represents  the  weight  of   >SL  in  the  subset  of  systems 
ending  in  T-    when  the  entire  ensemble  has  the  weight  distribution    ^  <,     In- 
sofar  as  the   estimation  of  WL  is  correct,  observer  B's  retrodition  based  on 
(2„8)  must  be  statistically  successful  in  this  sub-ensemble „     If  the  esti- 
mation of  vT  is  unreliable,   all  retrodiction  is  meaningless „ 

If  retrodictor  B  does  not  have  any  preliminary  knowledge  about  the 
habit  of  A5  the  only  thing  he  can  do  is  to  resort  to  the  principle  of  ig- 
norance and  to  assume  that  the  a  priori  probability  ^i  is  equal  for  each 
quentum  state  5-L  „     This  attitude  of  B  will  be  successful  (verifiable  by 
repetition)  only  if  A  prepares  an  ensemble  with  equal  weight  for  all  quan- 
turn  states  (similar  to  the  microcanonical  ensemble  on  an  energy  shell)  and 
if  B  picks  up  only  those  cases  which  have  landed  In  T-  and  then  classifies 

c) 
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them  according  to  various  possible  initial  states.  According  to  this  simpli- 
fying assumption^  (2»8)  will  become 

Q«U«-i)  =  ?  l"*-  (2.9) 

and s  further  with  the  help  of  the  inverse  normalization  (206)s 

Q.CL-p  =  ?Ci^p  (2.10) 

It  should  be  well  noted  that   (2,9)  and  (2„10)  are  based  on  a  specific 
assumption  that  \fL  is  uniform,,     In  factp  retrodictor  B  can  very  easily  be 
"fooled"  by  monitor  A„     Suppose  for  instance  that  there  are  only  two  possi- 
ble states  (1)  and  (2)  and  P(l->i  )  =  P(l-»2)  =P(  2*1  )  =  P(2-»i)  =  J.     No  matter 
what  ratio    ^i/uj2  monitor  A  may  choose g  retrodictor  B  will  find  one  half  of 
the  cases  in  state  (1)  and  the  other  half  in  state  (2)G     Conversely s  ob- 
server B's  retrodiction  based  on  the  equal  distribution  that  one  half  of  the 
cases  must  have  originated  from  (l)  and  the  other  half  from  ^2)  may  be  com- 
pletely wrongs  monitor  A  may  have  handed  over  to  retrodictor  B  only  those 
systems  which  were  found  by  A  to  be  in  (l)  at   t-o  „     The  best  way  to  avoid 
this  deception  on  the  part  of  B  would  be  to  impose  on  A9   as  a  rule  of  the 
game,    that  he  should  pick  up  cases  at  random  from  the'microcanical  ensemble „ 
Then  (2.9)  or  (2o10)  will  have  a  meaning  in  &  sub-ensemble  which  lands  in 
Tt    o     We  shall  hereinafter  refer  to  the  retrodiction  based  on  the  uniform 
taJ's  as  a  "blind  retrodiction." 

Prediction^   in  contrast  to  retrod ict ion 9  has  a  simpler  rule  of  games 
the  monitor  (posterior  observer)  is  required  to  show  all  his  result s„     Then,, 
the  prediction  based  on  (2„3)  will  always  be  statistically  successful.     It 
should  be  emphasized  that  this  asymmetry  between  prediction  and  retrodiction 
originates  from  the  asymmetry  of  the  "rules  of  game"?     in  prediction?  the 
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predictor  has  the  right  to  prepare  the  ensemble^  while  in  retrodiction^ 
the  monitor  has  the  right.  We  can  easily  change  the  rules  to  make  predic- 
tion just  as  unreliable  as  retrodiction.  Suppose  monitor  B  referring  to  a 
prediction  has  the  tendency  to  forget  to  record  some  of  the  cases  in  such  a 
way  that  the  chance  of  state  %  being  recorded  by  him  is  proportional  to 

^y  .  Then5  the  prediction  by  A  will  be  that  the  systems  registered  by  him 
as  /S^  will  be  recorded  by  monitor  B  with  the  distribution  given  by 

2—L-  (2.11) 

which  offers  a  nice  parallelism  to  (2.8).  We  shall  however  seldom  have  to 
deal  with  such  a  'forgetful"  observer,.  The  "rules  of  game"  must  be  chosen 
in  each  case  in  such  a  way  that  they  correspond  faithfully  to  the  nature  of 
the  actual  description  of  physical  phenomena  under  consideration,,  In  this 
sense,  P(  L-»L  )(2.3)  may  be  used  for  predictions  Dl*  for  retrodiction  we  have 
to  use  Q.(l*-I.  )(2.8)  with  indeterminate  ^  in  general  case. 

Exception  has  to  be  made  to  the  entire  consideration  of  this  section 
either  if  (l)  $  and  T  are  the  same  set  and  it  commutes  with  the  Hamiltonian 
of  the  system  or  if  (2)  $  andT  are  the  same  and  the  time  duration  X   is 
zero.  In  this  case,,  P(l-»T  )  ~  Sir  and  equation  (2o10)  follows  automatically 
from  (2.8)  irrespective  of  the  uPss 

Q(t<-j)  =  PU-^)-  S-£  (2.12) 

Retrodiction  is  perfectly  successful  in  this  special  case0 

The  situation  in  classical  physics  may  be  included  in  this  case5  if 
the  "state"  is  determined  as  precisely  as  possible  in  principle s   i.e.,,  if 
the  system  is  located  at  a  point  in  the  phase  space. 
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#3.     Macroscopic  Retrod  iction 

We  shall  now  introduce  the  concept  of  macroscopic  cells  in  our  consi- 
deration.    The  macroscopic  observations  are  known  to  be  compatible  with  one 
another,   therefore  we  can  think  of  a  family  &  of  microscopic  observables 
which  are  compatible  with  all  these  macroscopic  observations„     In  general s 
this    <§   will  not  be  commutable  with  the  exact  Hamiltonian  of  the  system. ^ 
Suppose  further  that  the  eigenstates s  S,  ,  S>i9  etc.?   are  grouped  into  macro- 
scopic cells  which  are  labeled  by    Msl,  2,    „    „    . ,  in  such  a  way  that  cell 
K~  1  contains    n^  eigenstates  of  $  ,  cell    M-2  contains  Yli_  eigenstates 
of  S  ,  etc0 

A  macroscopic  prediction  consists  in  inferring  the  probability  of 
finding  the  system  in  cell  V  at    t-X,  when  it  is  known  that  the  system  was 
found  in  cell  M.  at    t-0    «     The  answer  will  be2  in  terms  of  the  microscopic 
transition  probabilities 9 

where  2j     means  that  l  should  run  over  all  the  eigenstates  contained  in 
l 

cell  M-  o     It  should  be  noted  that  this  answer  is  based  on  the  equal  weight 
of  /SL  within  cell  K  9  and  complete  disorder  of  phase  among  these  states    S>io 
In  other  words5  we  are  taking  as  the  initial  state  a  density  matrix  (statis- 
tical ensemble)  which  corresponds  to  the  Hilbert  subspace  M.  0     Writing    frC<p} 
for  the  projection  operator  for  quantum  state  <P  9  we  can  express  our  initial 
ensemble  by 

^        V  .  (3.2) 

This  is  the  best  we  can  do  under  the  given  information  that  the  system  was 

5.     The  exact  Hamiltonian  may  commute  with  the  "macroscopic  energy"  but  not 
with  the  other  macroscopic  quantities.     J0v0  Neumann^  ZS0   f .Phys.57s30 
(1929).  •  "H= 


found  in  R  at   t-0    .     (3.1)  is  of  course  normalized  with  regard  to  V  ,   i.e., 

Now  the  retrodiction  consists  in  inferring  the  probability  that  the 
system  had  been  found  in  macroscopic  state  K  at    i-0   9  when  it  is  known 
that  the  system  was  found  to  be  in  macroscopic  state  V    at    t-"C  0     Again 
introducing  the  a  priori  probability  ^Lfor  each  quantum  state  in  cell  a.   , 
one  will  answer  that  the  probability  in  question  is  given  by 


QC^^V)"  tf  S?w*-^  0.3) 
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If  we  can  assume  that  the  background  ensemble  of  A  was  a  microcanonical 

ensemble ,  i.e.,   if  all  the  \j-8s  are  equal.,  then  we  can  simplify  (3.3)  to 
the  forms 

where  the  inverse  normalization  (2.4)  has  been  utilized.     The  probability 
(3.3)  or  (3.4)  satisfies  the  normalization  condition  with  regard  to    m.  i 

It  might  appear  as  if  prediction  in  the  macroscopic  case  were  equally 
unreliable  as  retrodiction  since  we  have  to  use  the  assumption  of  equal 
probability  (within  cell  ll  )  also  for  prediction  here.     However s   it  should 
not  be  forgotten  that  it  is  observer-predictor  A  himself  who  prepares  the 
initial  ensemble^  therefore  unless  he  puts  uneven*,   selective  weights  to  vari- 
ous states  within  cell  M.   he  can  succeed.     On  the  other  handp  the  supposed 
even  weight  all  over  the  energy  shell  assumed  in  (3.4)  is  not  in  the   control 
of  observer-retrodictor  B9  who  therefore  can  very  easily  fail  in  his  retro- 
diction. 

Let  us  next  examine  the  consequences  of  reversibility  (invariance  for 

time-reversal)  and  inversibility  (invariance  for  space-and-time  inversion) 
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on  our  problem.     The  reversed  state  S'  of  a  state  5  means  the  one  in  which 
all  the  particles  have  the  same  positions  as  in  S  but  the  equal  and  opposite 
velocities  to  those  in  *5  „       The  inversed  state  S/  of  a  state  -S  means  the 

one  in  which  all  the  particles  have  the  same  velocities  as  in  S  but  the 

c     6 
space-inverted  positions  as  compared  with  £>  «,       Reversibility  and  inversi- 

bility,  which  hold  in  the  basic  processes  in  quantum  mechanics,  then  means 

PGS-»  T)  =  pCl'-s'j  .  (3.5) 

The  macroscopic  observations  usually'   cannot  distinguish  a  state  from 
its  reversed  or  inversed  state     In  other  words,   a  cellM  contains  the   re- 
versed, as  well  as  inversed,  state  5    of  a  state  S  if  it  contains    S    „     Then 
from  (3ol)5  we  obtain  in  virtue  of  (3.5)5 

K^)  =  ±ft%  ^-0=^-/0.  (3„6) 

Using  this  relation^,  we  can  write  (3.3)  in  the  forms 

QCA^-^)=    -^ .  (3.7) 

and  with  the  assumption  of  uniform  weighty 

Q.C/*+->0^  PC*-/*)  (3.8) 

which  has  a  striking  simplicity,,  (3.4)  and  (3.8)  are  applicable  only  to  the 
"blind"  macroscopic  retrodictionQ 

What  has  been  developed  in  this  section  also  applies  to  classical,  sta- 
tistical considerations  if  we  replace  the  number  of  quantum  states  by  the 
volume  in  the  phase  space. 

60  For  the  precise  definition  of  time=rever3al  and  space~and~time  inversion, 
see  Sections  354,  Part  Is  and  Section  39   Part  II s  SoWatanabe,  Rev„Mod0Phys, 
27,  in  press. 

7o  This  is  certainly  the  case  if  we  limit  the  macroscoDic  quantities  to  a 

certain  category s  for  instance  to  the  thermodynamical  variables „ 
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#4.     Application  of  the  H-Theorem  to  the  Past 

The  probability  P(  V  -»/*.  )  depends  naturally  on  the  length  of  the  in- 
terval T:    between  the  two  observations.     The  ergodic  H-theorem,  in  essence, 

states  that  if  the  exact  Hamiltonian  does  not  commute  with  the  observable- 

.  ft 

family  §  which  is  compatible  with  the  macroscopic  observations,     then  the 

probability  ?(v->^)  averaged  over  possible  values  of   t"  is  proportional 
to  the  size  of  the  final  cell  a.  % 

<P(V-/a)>   =  V/K  (4.1) 

where  K    is  the  total  number  of  quantum  states  on  the  energy  shell. 

Eq.   (4.1)  shows  that  if  we  take  a  value  of  t  arbitrarily  from  its 
possible  domains    o<  t<°o  9  the  probability  P(V  -*^a)  of  finding  the  sys- 
tem in  a  large  cell  a  is  large.     Invoking  now  the  relation  (3-8),  we  can 
say  that  on  the  assumption  of  blind  retrodiction,  the  probability  Q0(u<-^  ) 
that  the  system  had  been  found  at  t-— t  in  a  large  cell  m  is  also  large  if 
—  f    is  arbitrarily  taken  from  its  possible  domain  -oe<-t  <®  °     If  we  use 
the  Boltzmannian  entropy     -$g  s 

8.  Although  the  non-commutability  of  the  exact  Hamiltonian  with  &  is  the 
main  hypothesis,  we  need  some  more  auxiliary  conditions  to  derive  this 
result 0  For  the  two  versions  of  these  conditions^  see  J.  von  Neumann^ 
ZS.   f.  Phys.,  106,   57,  30(1929);  and  ¥.  Pauli  and  M.  Fierz,  ZS.   f.  Phys. 

106,  572  (1937). 

9.  The  time-average  in  v.  Neumann's  proof  can  be  only  for  the  positive 
values  of  X    •     It  should  be  noted  also  that  the  H-theorem  considered 

here  refers  to  one  initial  observation  (t  =  o  )  and  one  final  observa- 
tion (t-t  )  and  is  different  from  the  consideration  based  on       repeated 
observations.     See  Section  7,  Part  I,  S.Watanabe,  Rev.Mod.Phys.  27  in 
press.  -14- 


we  can  say  on  the  basis  of  blind  retrod ict ion  that  if  -%b  at    t-0  has  a  cer- 
tain non-maximum  value  then  it  is  just  as  probable  to  have  a  larger  entropy 
value  in  the  future  as  in  the  past.     This  is  the  well-known  conclusion  of  a 
formal  application  of  the  H-theorem  to  the  past.     We  could  also  use  the 
Gibbsian  entropy  -£L  i 

V^-^?CP^0VPC^)/n/0  (4.3) 


I ,(_t)  =  -2.  QCy^v)   4n  [Q  </<«-*>  y^j 


/*■        '  ^  (4.4) 

with  T>o  ,  but  it  may  be  easier  to  visualize  the  situation  with  the  help  of 
the  Boltzmannian  entropy,, 

The  above  argument  is  based  on  the  premise  of  blind  retrodiction  which 
may  be  the  only  possible  basis  of  inference  if  it  is  perfectly  certain  that 
the  system  had  been  isolated  from  the  exterior  system  (except  a  possible 
prior  observer  who  does  not  perform  any  kind  of  selection)  and  if  we  have 
absolutely  no  other  information  about  the  system  than  that   it  was  found  in 
V    at    t-0o     However,  such  conditions  are  seldom  satisfied  in  the  actual 
circumstances.     A  sounder  inference  than  the  mere  blind  retrodiction,   in  line 
with  Symon's  idea  explained    in  our  Section  1,  would  be  somewhat  as  follows? 

Consider  two  cells  ^.and  V   such  that    ri  »YV.     Then  according  to  (3.6), 
we  have 

pCV-*/a)  >  (4.5) 

and  according  to  (4.1)  we  have  also 

jU^?2     =   3*    «1         (*/*    arbitrary) 


<PCsx-*/0>  V 


(4.6) 

Now  let  us  assume  that  we  find  a  system  at  t=o  in  cell  V  .  Seeing  from 
(4.5)  that  it,  is  extremely  unprobable  for  a  system  starting  from  a  large  cell 
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u    to  reach  a  small  cell  V   ,  we  suspect  that  such  was  not  the  actual  history 
behind  the  system  we  have  just  found  in  i>   .  This  inference  is  a  direct  con- 
tradiction to  the  result  mentioned  above  based  on  the  uniform  UT.  Thus  we 
are  led  to  modify  the  assumption  of  uniform  ^in  such  a  manner  as  to  give 
less  weight  ^^    to  larger  cells  f*   and  larger  weight  to  smaller  cells.  Such 
an  assumption  of  non-uniform  t*T  is  perfectly  allowable  according  to  our 
theory.  In  fact,  for  instance,  if  there  is  any  possible  doubt  about  the 
isolation  of  the  system  in  the  past,  there  is  no  reason  to  adopt  the  hypo- 
thesis of  blind  retrodiction0  Then  the  result  of  observation  that  the  sys- 
tem was  found  in  a  small  cell  at  present  can  very  well  reflect  itself  in  our 
estimation  of  the  ij's0 

Once  we  have  abandoned  the  assumption  of  uniform  tJ"  ,  we  cannot  use 
(3.&)  any  longer  and  have  to  go  back  to  (3°7)o  In  spite  of  the  fact  that 
?(V-rfA.)   may  be  large,  $  (/a*-V  )  can  be  small  if  Wu  is  small  in  (3„7).  And 
the  probability  of  the  system  having  originated  from  a  small  cell  can  become 
quite  large,,  Thus  the  entropy  value  -%&  at  t--tmay  probably  have  been 
smaller.  Our  formalism  is  flexible  enough  to  incorporate  this  very  reasona- 
ble inference. 

The  above  argument  can  be  applied  also  to  the  class ical^  statistical 
mechanics.  It  is  interesting  to  note  how  our  argument  can  stand  the  famous 
objection  due  to  Loschmidt.  It  is  true  that  in  cell  V   there  are  just  as 
many  microstates  headed  for  larger  values  of  entropy  -8$  in  the  future  as 
those  which  have  originated  from  larger  values  of  entropy  in  the  past.  If 
iS   is  uniform,  then  each  microstate  inside  cell  i)   will  be  occupied  by  the 
same  weight  on  account  of  the  permanence  of  the  microcanonical  ensemble,. 
Then,  Loschmidt' s  argument  becomes  valid,  and  we  have  to  conclude  larger 


values  of  entropy  just  as  well  for  the  future  as  for  the  past.     But  if  ixf 
is  not  necessarily  uniform,  then  we  need  not  assume  equal  weight  for  each 
microsfeate  inside  the  cell  for  the  purpose  of   extrapolation  towards  the 
past.     Then  Loschmidt's  objection  does  not  hold  any  longer.     For  a  realis- 
tic macroscopic  retrodiction,  we  should  not  use  the  uniform  weight  within 
the  macroscopic  cell  V  ,  while  it  may  be  assumed  for  prediction. 

It  is  interesting  to  note  that  the  blind  application  of  the  ergodic 
H-theorem  to  the  past  does  not  actually  yield  any  newer  ihformation  than 
what  one  has  put  in  as  the  assumption.  For  the  combination  of  (4.1)  and 
(3.8)  gives 

'  (4.7) 

which  is  nothing  but  the  expression  of  a  uniform  probability,  an  assump- 
tion which  has  been  used  in  deriving  (3.8). 
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#5.     Retrodictive  Quantum  Mechanics 

Our  basic  equation  (2.8)  for  retrodiction  can  be  written  as 

QU^p  -  ^i^U^p/z-^M'^p    ,  (5.1) 

with  the  help  of  (2.10); 

Q«  U*-  j)  a   PC  l~>j.)  (5.2) 

where  the   W-    depends  on  the  over-all  judgment  of  the  retrodictor.     Only 
when  the  system  has  been  isolated  in  the  past  and  there  is  no  other  clue  to 
the  past  history  of  the  system  than  the  observational  fact  that  the  system 
is  found  at  present  in  state  \  ,  then  the  retrodictor  will  use  the  uniform 
value  of    W-L  for  various   L's  and  ^(l^-l  )  will  reduce  to  Q0(  I  *- j-  ).     what 
follows  mainly  concerns  the  blind  retrodiction  represented  by  Q0  ,  but    Q 

can  be  derived  from  Q„  by  the  use  of  (5.1)  if  there  is  any  way  of  estimating 
OvT- 

In  this  section,  we  shall  first  show  that  the  quantity  given  in  (5.2) 
can  be  calculated  in  two  ways:     either  solving  the  Schrodinger  equation  with 
the  initial  condition    &i ,  or  solving  the  same  equation  with  the  final  con- 
dition T;    o     Although  the  resulting  values  of  probability  are  the  same,  the 

0 

first  method  agrees  better  with  the  idea  suggested  by  the  right  hand  side 
of  (5.2),  while  the  second  method  reflects  more  faithfully  the  idea  sug- 
gested by  the  left  hand  side.  Since  the  first  method  is  the  customary  one, 
we  shall  only  show  how  the  second  method  can  be  used  to  evaluate  the  same 
probability., 

Let  the  eigenf unctions  of  &  be  called  <p„,  <p*.,  ••  /  <f c,  ••   and  those  of  T 
-vb  -a;  ..  A).  ...  .  Let  further  the  solution  of  the  Schrodinger  equations 

3$tt)/at  =  -iHCt)£(t)  (5o3) 
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satisfying  the  final  condition: 

■£rO=^  (5.4) 

be  denoted  by  %(.t)   .     Expanding  £T(p)    according  to  the     <p-t  s 

5(o)==Lt<Uft  (5.5) 

we  can  easily  show  that  C&a^    represents  the  probability  (5.2). 
Consider  the  transition  matrix    U"Ct,,ta)  defined  by: 
UKtvtOtoi«-iH(fcl)tKtl,tO    ^     «jft„tO/*ti*  fLUtt.yt,)HttO,       (5.6) 

UCtitt,)^  1.     ,     u-u^to-v^^to^vct^to.  (5o7) 

Then  according  to  the  customary  theory,  the  probability  (5.2)  is  given  by 

PCt->p  =  |C^/l7(X,o)<f£  )f .  (5.8) 

On  the  other  hand,  $T(t)  considered  above  is 

£ytt)  =    ITCt.T:)^  (5.9) 

and  the  coefficients    c^  are 

at=Ofi,VCO,t)fj)  (5.10) 

On  account  of  the  unitarity  of  U  and  of  the  relation  U  ( o,  T.  )«t)   (  1,  O ) , 
(5.7),  we  obtain 

=  Cim,*)?^)*  C+^vc^o^c;*  (5ai) 

Hence,    in  view  of   (5#8), 

Qja-,  =  PCL-^)  ^  QeC:*-p.  (5.12) 

This  situation  suggests  a  new  picture  of  the  "state"  of  a  system  be- 
tween two  observations,  one  at  t=o  and  the  other  at  f  =t  s  There  exist 
simultaneously  two  states,  one  being  a  predictive  state  $p(t)  which  com- 
plies with  the  initial  condition  at  t=0  ,  and  the  other  a  retrodictive  state 
■$>(t)  which  complies  with  the  final  condition  at  t~t  .  Both  4p(t)  and 
$r(t)  obey  the  same  Schrodinger  equation.  This  picture,  though  redundant 
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in  practical  applications,  offers  certain  intellectual  interest  for   it  pro- 
vides a  complete  symmetry  between  two  consecutive  observations. 

Now,  we  should  like  to  look  upon  the  same   situation  from  a  slightly 
different  point  of  view,  namely  we  attempt  to  establish  a  time-symmetry, 
not  with  regard  to  two  observations,  but  with  regard  to  the  future  and  past 
referring  to  a  single  observation  at  hand.     Suppose  we  make  an  observation 
at   t-0  and   obtain  a  result    <PL  .     Then  our  inference  will  develop  towards 
the  future  just  as  well  as  towards  the  past.     Let  us  introduce  a  new  varia- 
ble  S  ,  called  inference  parameter,  which  coincides  with  t   when  it  refers 
to  prediction,  and  which  is  equal  to  minus  t  when  it  refers  to  retrodiction. 
&  is  then  always  positive. ' 

The  development  of  a  retrodictive  state    $r (t )  starting  backward  from 
fl    at  t-0   is  nothing  but  the  extrapolation  of  the  predictive  state  and 
obeys 

>$TC0M=-lHCO$r(t)     ,        $r^^%  ,  (5.13) 

or  in  terms  of   s  , 

^t-OAsB   riHC-s)£rC-0.  (5.14) 

The  Hamiltonian  being  hermitian,  the  complex  conjugate  of  (5.14)  becomes 

■fc4^-sVds  =   -i'4*(-OH(-0  (5.15) 

Introducing  a  time-independent  unitary  operator  R  ,  called  reversion  opera- 
tor,    such  that 

(R"1  hU-OR)*   =  HCO  (5.16) 

we  can  rewrite  (5.15)  in  the  form; 

?tfStC-0/»*«-iHCO«I*rC-y  (5a?) 

This  means  that  £(s)defined  by 

$(s)=RT^*C-0   *'   $TC-S)=  Rf*(s)  (5.18) 
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satisfies  the  same  equation  as  the  predictive  state: 

Si(0/2>3   ~  -i  H(5)$(0    .  (5.19) 

The  only  difference  is  that   §KO  satisfies  the  initial  condition: 

$(0)  =  ^^    ,  (5.20) 

while  the  predictive  state    ^p(s)  satisfies 

$pts)  -ft    .  (5.21) 

The  probability  of  finding  this  system  at  t-  s  >o  in  state    ii>.  will  be 

?Ct-*p  =  |Ci|»j,*rCO)|*  (5.22) 

while  the  blind  probability  that  the  system  had  been  found  at  t=-s<Din  state 
^y  will  be 

-ICt^M^Ol^lCR't/^COl*.  (5'23) 

In  brief,  the  two  inferential  states  iL(s  )  and  3L(5)  can  be  treated  in 
a  parallel  fashion,  only  using  R  <p    for  §  (s)  wherever  we  would  use  ^  for 

^Ep  (S).     Compare  (5.20)  and  (5.23)  respectively  with  (5.21)  and  (5.22).     It 
would  then  be  a  tempting  idea  to  introduce  a  quantity  which  comprises  both 

3*f>     and   |>   on  the  same  footing.     A  "double  inferential  state"  composed  of 
two  components: 

will  obey  the  Schrodinger  equation: 

^(S)/as  =  -aHOO^cs)  ^  (5#25) 

with  H(*W  H(s)   °    ^ 


o     HCO 


The  initial  condition  of    *$f(s)  is 


(5.26) 


(5.27) 
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and  the  solution  of  (5.25)  at  an  arbitrary  value  of  s   will  then  have  the 


R^VO/.  (5.28) 

We  can  however  liberalize  the  relationship  between  the  two  components  of 
(5.28)  without  affecting  their  physical  meaning.     Namely,  taking  any  unitary 
operator^//  which  commutes  with  all  the  known  physical  quantities,  we  can 
write,  instead  of  (5.28), 

Uw*W  (5-29) 

This  amounts  to  replacing  the  retrodictive  state    $r(-s)  by     W  £y(~0  > 
which  of  course  does  not  change  the  content  of  a  retrodiction.     The   initial 
condition  of  (5.29)  is  then 


^(o)  =  (   ft 


(5.30) 


To  make  our  discussion  more  concrete  let  us  take  as     W 

W~AK  (5.31) 

where  A,  in  any  arbitrary  integer  and   A    is  given  by10 

d«*»  =  4r  =  4*  =  Rrr'  -IT,  C-i/L  (5.32) 

in  which  N"-L  is  the  occupation  number  operator  for  the   spinor  eigenstate 
labeled   L  .     A    is  known  to  commute  with  the  reversion  operator,  R  .     Then 
general  pattern  of  a  double  inferential  function  is 

\ &**&{-&  )  >    CS>°;  '  (5.33) 

with  arbitrary   \,      &*  is  unity  when  n.  is  even. 

We  can  now  introduce  the   "reversed"  inferential  function  "Q&   of  (5.33) 


10.     See  Section  12,  Part  II,  S.  Watanabe,  Rev.Mod.Phys.  27,  in  press. 
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which  certainly  falls  in  the  supposed  general  pattern  of  an  Inferential 
function  (5.33),  only  the  arbitrary  number  n.  being  replaced  by  n+\    ,  for 

J(s;  =  ^   *r(ianRT  **(-*))*  (5.35) 

Furthermore,  at  each  value  of  s  ,  the  first  and  the  second  components  of 
(5.34)  represent  respectively  the  so-called  "reversed  states"  of  the  first 
and  the  second  components  of  (5.33).   Indeed,  for  a  given  state  ^(t),   its 
reversed  state  can  be  expressed  by  W  R1  t  ("t).  The  transformation  from 
(5.33)  to  (5.34)  can  be  written 

*R(s)=1fc*CO  *   ^(S)    =E^R(5)  ,  (5-36) 


**  a  - 


(;i). 


(5.37) 
The  formalism  presented  here  has  no  practical  advantage  over  the  current 
quentum  theory,  but  it  has  a  formal  advantage  in  that  time-reversal  is  repre- 
sented here  by  a  linear  transformation  (5.37)  and  double  time- reversal     be- 
comes an  identity  transformations 

Rz-1.  (5.38) 
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#6.     Irreversibility  of  Inference  and  Information 

Suppose  observer  A  prepares  a  large  number K  of  systems  which  were  found 
at  t*0  in  state    <fL  .     After  t  seconds,  each  system  will  become     U(*,o)  f-      , 
If  observer  B  performs  at  t=T   an  observation  with  the  complete  set  T  C^, ,-<&.,..}  , 
then  IT  |(\b;  t  LT(T/0)  f  <,  )  I       systems  will  turn  out  to  be  in  state  ^  .     with 
the  help  of  projection  operators    ^PC^J   we  can  write  this  process  in  the  fol- 
lowing schema? 

(6.1) 


^s-sicft,  vtT,o)Tt}r<s>cftj . 


The  amount  of  "information"       carried  by  the  knowledge  about   the  system 
represented  by  <X  is 

I  =  Sp*.  C£  &ni  $0  +  <****•  (6.2) 

This  quantity  does  not  change  in  the  first  step  of  transition  in  (6.1),  but 
does  decrease  in  the  second  step.     This  is  the  famous  irreversibility  of  ob- 
servation pointed  out  by  von  Neumann.  It  should  be  noted  that    ^3  in 
(6.1)  does  not  represent  the   knowledge  obtained  by  observer  B  in  individual 
cases,  for  in  each  case  observer  B  knows  perfectly  well  in  which  one  of  the 


11.     C„  Shannon  and  W.  Weaver ,  Mathematical  Theory  of  Communication  (University 
of  Illinois  Press,   Urbana,  1949 )<>     We  do  not  indulge  here  in  the  discus- 
sion regarding  the  sign  before  the  Spur  and  regarding  the  constant  in 
(6.2).     The  quantity  ($.2)  was  first  used  by  von  Neumann,,  Mathematische 
Crrandlagen  der  Quantenmechanik  (Julius  Springer,  Berlin^  1932).     See  also 
L.  Szilard,  ZS.  f.  Phys.  j£,  840(1929).     For  an  early  application  of  the 
quantity  (6.2)     to  a  concrete  physical  problem,   see  S.  Watanabe,  ZS.   f. 
Phys.  ]JL3,  4S2(1939). 
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\J   's  the  system  is  found.      Qs  can  be  considered  as  a  global  description  of 
the  entire  ensemble  after  the  observation,  or  as  the  prediction  of  the  re- 
sult of  B  in  each  case. 

Next,   suppose  that  observer  A  prepares  at   t~0    a  large  number  of  sys- 
tmes  with  equal  weight  in  all  possible     J  fs  (s  B  )0     Observer  B  performs 
at  t=t   an  observation  with  "F  ,  and  a  certain  large  number  K  of  systems 
is  found  to  be  in  state    ^r    •     He  considers  now  only  those  systems   ending 
in  Yi  y  and  ask  what   percent  of  them  had  been  registered  as    <p^    by  the  pre- 
vious observer  A.     Then,  he  extrapolates    *|*r    backward  by  the  Schrodinger 
equation  from  t-T   to  t=  o  ,  and  calculates    I  C^l  ,  Mio,  t)  <|^  )  |         .     His 
inference  will  then  be  that,  among  .hi  systems  that  he  found  in    Yl    , 
$  \  (^i  ,  IT(0|T)  ^r  )*~  |  systems  must  have  been  found  by  A  to  be  in 

Q>j_    .     Schematically,  this  inference  can  be  denoted  by 

l^s-JO?,,  treat  )+pr<?l¥-J    «- 

(6  3) 

which  exhibits  a  parallelism  to  (6.1).       QC  in  (6.3)  represents  a  partial 
ensemble  immersed  in  the  uniform  ensembles 

CJe  =   ■XLctfCTc]  (6.0 

prepared  by  A.     If  A  would  have  started  with    Q/  ,  then  B  would  not  obtain 
Q/x   .     Nonetheless,      H$    represents  the  legitimate  inference  made  by  B 
based  on  the  blind  retrodiction  hypothesis  with  regard  to  the  results  that 
A  had  obtained  in  the  systems  which  were  later  found  by  B  in     <K  . 

If  B  has  any  further  source  of  judgment  about  the  initial  ensemble  $ 
he  will  modify  the  assumption  of  uniform  weighty  and  attach  a  reappraised  a 
priori  probability  VT^   to  each      G>-    .     In  this  case,      @u    will  become 

in  accordance  with  (2.8)  or  (5.1) 
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It  is  evident  that,  no  matter  whether  one  uses     ^L     of  (6.3)  or  that 
of  (6.5),  the  amount  of  information  carried  by    Q2    is  smaller  than  that  car- 
ried by  G1     ,  i.e.,  the  decrease  of  information  here  takes  place  in  the  back- 
ward direction  of  time.     Both  the  case  of  prediction  (6.1)  and  the  case  of 
retrodiction  (6.3)  or  (6.5)  can,  however,  be  included  in  the  statement  that 
the  amount  of  information  decreases  in  the  direction  of  inference,   i.e.,   in 
the  positive  direction  of  the  inference  parameter  of  the  last  section.     This 
last  result  is  in  a  good  agreement  with  the  common  sense,  for  an  inference 
cannot  contain  more  information  than  the  fact  from  which  the  inference  is 
drawn. 

In  the  statements  in  the  foregoing,  the  phrase  "information  decreases" 
must  be  replaced  by  "information  remains  constant"  in  the  following  two  cases? 
When  (1)  -S  and  Tare  the  same  set  and  the  elapse  of  time  t  is  zero,  or 
(2)  §>  and  ■[JFare  the  same  set  and  commute  with  the  Hamiltonian  of  the  system. 
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